Knowledge Discovery Using Heuristics
نویسنده
چکیده
Uninformed or blind search, which processes and evaluates all nodes of a search space in the worst case, is not realistic for extracting knowledge from large data sets because of time constraints that are closely related to the dimension of the data. Generally, the search space increases exponentially with problem size, thereby limiting the size of problems that can realistically be solved using exact techniques such as exhaustive search. An alternative solution is represented by heuristic techniques, which can provide much help in areas where classical search methods failed. The word “heuristic” comes from Greek and means “to know,” “to find,” “to discover” or “to guide an investigation”. Specifically, “Heuristics are techniques which seek good (near-optimal) solutions at a reasonable computational cost without being able to guarantee either feasibility or optimality, or even in many cases to state how close to optimality a particular feasible solution is” (Russell & Norvig, 1995). Heuristic refers to any techniques that improve the average-case performance on a problem-solving task but do not necessarily improve the worst-case performance. Heuristic techniques search the problem space “intelligently” using knowledge of previously tried solutions to guide the search into fruitful areas of the search space. Often, search spaces are so large that only heuristic search can produce a solution in reasonable time. These techniques improve the efficiency of a search process, sometimes by sacrificing the completeness or the optimality of the solution. Heuristics are estimates of the distance remaining to the goal, estimates computed based on the domain knowledge. The domain knowledge provides help to heuristics in guiding the search and can be represented in a variety of knowledge formats. These formats include patterns, networks, trees, graphs, version spaces, rule sets, equations, and contingency tables. With regard to heuristics, there are a number of generic approaches such as greedy, A* search, tabu search, simulating annealing, and population-based heuristics. The heuristic methods can be applied to a wide class of problems in optimization, classification, statistics, recognition, planning and design. Of special interest is the integration of heuristic search principles with the dynamic processes in which data become available in successive stages, or where data and inputs are subject to uncertainties or with large-scale data sets. The integration is a vehicle to generate data driven hypotheses. The kind of knowledge produced, and the heuristic search algorithm selected, will reflect the nature of the data analysis task. The hypotheses are being represented as sets of decision rules and the extracted rules will be represented in terms of rough sets. Rough sets were selected because of the nature of our data sets. From a mathematical point of view, the problems can be formulated in terms of the well-known, minimal set cover problem, which is a combinatorial optimization problem. Traditional methods for combinatorial optimization problems are not appropriate here for several reasons. These methods are NP-hard in the worst case and would be costly to use given the size of the data sets. Also, since large data sets are dynamical in nature, adding new data would require running the traditional combinatorial approach again. The techniques used to solve these difficult optimization problems have slowly evolved from constructive methods, like uniformed search, to local search techniques and to population-based algorithms. Our research goal was to use blend population-based algorithms with methods dealing with uncertainty in order to induce rules from large data sets.
منابع مشابه
RUNNING HEAD: HEURISTICS TO FACILITATE DISCOVERY LEARNING Using Heuristics to Facilitate Discovery Learning in a Simulation Learning Environment in a Physics Domain
This article describes a study into the role of offering heuristic support in facilitating discovery learning in simulation-based learning. The study compares two simulationbased learning environments that use heuristics to support the learners in discovering the physics domain of collisions. In one learning environment heuristics are only used to provide the learner with guidance derived from ...
متن کاملBurke, Edmund and MacCarthy, Bart L. and Petrovic, Sanja and Qu, Rong (2002) Knowledge discovery in hyper-heuristic using case-based reasoning on course timetabling. In: International Conference on the Practice and Theory of Automated Timetabling, Aug 2002, Gent Belgium
This paper presents a new hyper-heuristic method using Case-Based Reasoning (CBR) for solving course timetabling problems. The term Hyperheuristics has recently been employed to refer to “heuristics that choose heuristics” rather than heuristics that operate directly on given problems. One of the overriding motivations of hyper-heuristic methods is the attempt to develop techniques that can ope...
متن کاملUnderstanding the Relationship between Scheduling Problem Structure and Heuristic Performance using Knowledge Discovery
Using a knowledge discovery approach, we seek insights into the relationships between problem structure and the effectiveness of scheduling heuristics. A large collection of 75,000 instances of the single machine early/tardy scheduling problem is generated, characterized by six features, and used to explore the performance of two common scheduling heuristics. The best heuristic is selected usin...
متن کاملHeuristic Knowledge Discovery 1 Running head: Heuristic Knowledge Discovery, Genetic Algorithms and Rough Sets Heuristic Knowledge Discovery for Archaeological Data Using Genetic Algorithms and Rough Sets
The goal for of this research is to investigate and develop heuristic tools in order to extract meaningful knowledge from archeological large-scale data sets. Database queries help us to answer only simple questions. Intelligent search tools integrate heuristics with knowledge discovery tools and they use data to build models of the real world. We would like to investigate these tools and combi...
متن کاملKnowledge Discovery in a Hyper-heuristic for Course Timetabling Using Case-Based Reasoning
This paper presents a new hyper-heuristic method using Case-Based Reasoning (CBR) for solving course timetabling problems. The term Hyperheuristics has recently been employed to refer to “heuristics that choose heuristics” rather than heuristics that operate directly on given problems. One of the overriding motivations of hyper-heuristic methods is the attempt to develop techniques that can ope...
متن کاملPseudo-Independent Models and Decision Theoretic Knowledge Discovery
Graphical models such as Bayesian networks (BNs) (Pearl, 1988; Jensen & Nielsen, 2007) and decomposable Markov networks (DMNs) (Xiang, Wong., & Cercone, 1997) have been widely applied to probabilistic reasoning in intelligent systems. Knowledge representation using such models for a simple problem domain is illustrated in Figure 1: Virus can damage computer files and so can a power glitch. Powe...
متن کامل